Libraries:
library(tidyverse)
library(readr)
library(rvest)
library(stats)
library(readxl)
library(dplyr)
library(stringr)
library(ggplot2)
library(ggthemes)
library(stringr)
library(data.table)
library(geojsonio)
library(leaflet)
library(rgdal)
library(haven)
Data:
# GDP up to Feb 2020
# https://ihsmarkit.com/products/us-monthly-gdp-index.html
gdp.index.data <- readxl::read_xlsx('US-Monthly-GDP-History-Data.xlsx', sheet = 3)
gdp.index <- gdp.index.data
colnames(gdp.index)[1] <- "Y_M"
year.month <- str_split_fixed(gdp.index$Y_M, ' - ', 2)
colnames(year.month) <- c('Year', 'Month')
gdp.index <- cbind(year.month, gdp.index[, -1])
gdp.annual <- gdp.index %>%
group_by(Year) %>%
summarize(MaxGDP = max(`Monthly Real GDP Index`),
MinGDP = min(`Monthly Real GDP Index`))
# https://nces.ed.gov/programs/digest/d18/tables/dt18_306.10.asp
enrollment.data <- read_xls('tabn306.10.xls')
enrollment <- enrollment.data[1:12]
# enrollment is in thousands
enrollment <- enrollment[-c(1, 3, 15, 27, 39, 51, 63, 75, 99, 111, 123, 135:139), ]
col1 <- data.frame(str_remove_all(enrollment[[1]], '\\.'), stringsAsFactors = FALSE)
col1[2, 1] <- "All_Students"
enrollment <- cbind(col1, enrollment[, -1])
enrollment <- t(enrollment)
rownames(enrollment) <- c()
colnames(enrollment) <- enrollment[1, ]
enrollment <- data.frame(enrollment)
colnames(enrollment)[1] <- 'Year'
enrollment <- enrollment[-1, ]
Years <- as.numeric(str_extract(enrollment$Year, "[:digit:]{4}"))
enrollment <- cbind(Years, enrollment[, -1])
enrollment <- data.frame(lapply(enrollment, function(x){
gsub("---", NA, x)
}))
str(enrollment)
enrollment1 <- enrollment[, 1:2]
gdp.annual1 <- gdp.annual
all.students <- as.numeric(enrollment[4:11, 2])
gdp.annual$Year <- as.factor(gdp.annual$Year)
enrollment1$Years <- as.factor(enrollment1$Years)
enrollment1$All_Students <- as.numeric(as.character(enrollment1$All_Students))
Model 3:
First making a graph of annual GDP highs and lows – maybe a temporary proxy for recessions?
Annual enrollment graph:
Join enrollment data and gdp data to create linear model test:
Graph it?
College Proximity Question 5/3: (Reading in Ivy’s data)
cz_college <- read_dta("cz_college.dta")
cz <- read_dta('cz.dta')
colleges <- read_dta('colleges.dta')
mobility.results <- read_xlsx('mobility_results.xlsx')
Read in/create mobility data: (Trends in Mobility: Commuting Zone Intergenerational Mobility Estimates by Birth Cohort) https://opportunityinsights.org/data/?geographic_level=101&topic=0&paper_id=0#resource-listing
grep('ncollege', colnames(cz.mobility))
[1] 2132
Read in geojson file:
Commuting zones on the map (cz.geojson@data) are in 1990s format. They need to be converted so our post-2000 data can be connected to the shapefiles: (https://www.ers.usda.gov/data-products/commuting-zones-and-labor-market-areas/)
Try to run some lms:
lm1 <- lm()
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
---
title: "econ401 final project R Notebook"
output: html_notebook
---

Libraries:
```{r libraries, results = "markup"}
library(tidyverse)
library(readr)
library(rvest)
library(stats)
library(readxl)
library(dplyr)
library(stringr)
library(ggplot2)
library(ggthemes)
library(stringr)
library(data.table)
library(geojsonio)
library(leaflet)
library(rgdal)
library(haven)
```

Data:
```{r data, results = "markup"}
# GDP up to Feb 2020
# https://ihsmarkit.com/products/us-monthly-gdp-index.html
gdp.index.data <- readxl::read_xlsx('US-Monthly-GDP-History-Data.xlsx', sheet = 3)
gdp.index <- gdp.index.data
colnames(gdp.index)[1] <- "Y_M"
year.month <- str_split_fixed(gdp.index$Y_M, ' - ', 2)
colnames(year.month) <- c('Year', 'Month')
gdp.index <- cbind(year.month, gdp.index[, -1])
gdp.annual <- gdp.index %>%
  group_by(Year) %>%
  summarize(MaxGDP = max(`Monthly Real GDP Index`),
            MinGDP = min(`Monthly Real GDP Index`))

# https://nces.ed.gov/programs/digest/d18/tables/dt18_306.10.asp
enrollment.data <- read_xls('tabn306.10.xls')
enrollment <- enrollment.data[1:12]
# enrollment is in thousands
enrollment <- enrollment[-c(1, 3, 15, 27, 39, 51, 63, 75, 99, 111, 123, 135:139), ]
col1 <- data.frame(str_remove_all(enrollment[[1]], '\\.'), stringsAsFactors = FALSE)
col1[2, 1] <- "All_Students"
enrollment <- cbind(col1, enrollment[, -1])
enrollment <- t(enrollment)
rownames(enrollment) <- c()
colnames(enrollment) <- enrollment[1, ]
enrollment <- data.frame(enrollment)
colnames(enrollment)[1] <- 'Year'
enrollment <- enrollment[-1, ]
Years <- as.numeric(str_extract(enrollment$Year, "[:digit:]{4}"))
enrollment <- cbind(Years, enrollment[, -1])
enrollment <- data.frame(lapply(enrollment, function(x){ 
  gsub("---", NA, x)
}))
str(enrollment)

enrollment1 <- enrollment[, 1:2]
gdp.annual1 <- gdp.annual
all.students <- as.numeric(enrollment[4:11, 2])

gdp.annual$Year <- as.factor(gdp.annual$Year)
enrollment1$Years <- as.factor(enrollment1$Years)
enrollment1$All_Students <- as.numeric(as.character(enrollment1$All_Students))
```

Model 3:

  First making a graph of annual GDP highs and lows -- maybe a temporary proxy for recessions?
```{r graph1, include = FALSE}
gdp.annual %>%
  ggplot() +
  geom_line(mapping = aes(x = Year,
                 y = MaxGDP,
                 group = 1)) +
  geom_line(mapping = aes(x = Year,
                          y = MinGDP,
                          group = 1)) +
  theme_economist() +
  ylab('Real GDP')
```

  Annual enrollment graph:
```{r graph2, include = FALSE}
enrollment1 %>%
  ggplot() +
  geom_line(mapping = aes(x = Years,
                          y = All_Students,
                          group = 1)) +
  theme_economist() +
  ylab('Enrollment')
```

  Join enrollment data and gdp data to create linear model test:
```{r lm1, include = FALSE}
test <- inner_join(enrollment1, gdp.annual1,
          by = c("Years" = "Year"))

lm1 <- lm(All_Students ~ MaxGDP,
          data = test,
          na.action = na.omit)
summary(lm1)
```

  Graph it?
```{r graph3_4, include = FALSE}
test %>%
  ggplot() +
  geom_line(aes(x = Years,
                 y = All_Students,
                group = 1)) +
  theme_economist() +
  ylab('Enrollment by All Students')

test %>%
  ggplot() +
  geom_line(aes(x = Years,
                 y = MaxGDP,
                 group = 1)) +
  # geom_abline(slope = 0.8015, intercept = 6316.7207) +
  theme_economist()
```

College Proximity Question 5/3:
(Reading in Ivy's data)
```{r read proximity data, results = "markup"}
cz_college <- read_dta("cz_college.dta")
cz <- read_dta('cz.dta')
colleges <- read_dta('colleges.dta')
mobility.results <- read_xlsx('mobility_results.xlsx')
```

Read in/create mobility data:
(Trends in Mobility: Commuting Zone Intergenerational Mobility Estimates by Birth Cohort)
https://opportunityinsights.org/data/?geographic_level=101&topic=0&paper_id=0#resource-listing
```{r mobility data}
mobility.data <- read_xls('onlinedata1_trends.xls')
colnames(mobility.data) <- mobility.data[15, ]
mobility <- mobility.data[-c(1:16), ]
mobility.1986 <- mobility %>%
  filter(`Birth Cohort` == 1986)
mobility.1986$`Commuting Zone` <- as.numeric(mobility.1986$`Commuting Zone`)
cz.mobility.data <- full_join(mobility.1986[, c(1, 3:8)],
                  cz,
                  by = c(`Commuting Zone` = 'cz'))
cz.mobility <- cz.mobility.data[, c(1:8, 2132:2137)]
cz.mobility <- cz.mobility[, c(1, 8, 9:14, 3:7, 2)]
# write_csv(cz.mobility, 'cz.mobility.csv')
```

Read in geojson file:
```{r geojson}
# cz.geojson <- geojson_read("cz1990.json",
#                        what = "sp")
# View(cz.geojson@data)
cz.geojson %>%
  leaflet() %>%
  #addTiles() %>%
  addPolygons() %>%
  setView(-96, 37.8, 3)
```

Commuting zones on the map (cz.geojson@data) are in 1990s format. They need to be converted so our post-2000 data can be connected to the shapefiles:
(https://www.ers.usda.gov/data-products/commuting-zones-and-labor-market-areas/)
```{r cz shape combine, include=FALSE}
cz.conversions <- read_xls('cz00_eqv_v1.xls')
cz.conversions <- cz.conversions[, c(2:4)]
cz.conversions$`Commuting Zone ID, 1990` <- as.numeric(cz.conversions$`Commuting Zone ID, 1990`)
cz.conversions$`Commuting Zone ID, 1980` <- as.numeric(cz.conversions$`Commuting Zone ID, 1980`)
colnames(cz.conversions)[2] <- 'cz1990'
colnames(cz.conversions)[1] <- 'cz2000'
colnames(cz.conversions)[3] <- 'cz1980'

head(cz.geojson@data)
cz.geo <- cz.geojson
colnames(cz.mobility)[1] <- 'cz2000'

cz.geo@data <- left_join(cz.geo@data,
                  cz.conversions[, -3],
                  by = c('cz' = 'cz1990'))
cz.geo@data <- left_join(cz.geo@data,
                         cz.mobility,
                         by = 'cz2000')
# writeOGR(cz.geo, 'cz.geo', layer = 'cz.geo', driver = 'GeoJSON')

```

Try to run some lms:

```{r}
lm1 <- lm()

```



Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
